Finite population size effects in quasispecies models with single-peak fitness landscape 
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We consider finite population size effects for Crow-Kimura and Eigen quasispecies models with single peak 
fitness landscape. We formulate accurately the iteration procedure for the finite population models, then derive 
Hamilton-Jacobi equation (HJE) to describe the dynamic of the probability distribution. The steady state solu- 
tion of HJE gives the variance of the mean fitness. Our results are useful for understanding population sizes of 
virus in which the infinite population models can give reliable results for the biological evolution problems. 
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I. INTRODUCTION 

Investigation of biological evolution models QHg], such as 
Eigen model iH 01 and Crow-Kimura model @], by meth- 
ods of statistical or theoretical physics is highly fruitful in 
evolution research. The methods used include quantum me- 
chanics [9 [^statistical mechanics jToll - |[l2tl . quantum field 
theory lflCT4l4ll,lfl9ll, Hamilton-Jacobi equation (HJE) lfl5r - 
[l7tl . Such approach has given many exact results for evolution 
models I3II- II20I1 . solved a paradox of the origin of life I2H1 . 
and produced exact finite genome length corrections for the 
mean fitness and gene probabilities in some evolution models 
101. 



In the original formulation of the Eigen and Crow-Kimura 
models, the configurations of the genome of length L are rep- 
resented by M = 2 L spin configurations (s 1; s 2 , ■ ■ ■ , s^), 
where Sk for 1 < k < L take +1 or —1. Such representa- 
tion was used by Peng, et al. to study long-range correlation 
in nucleotide sequences f23ll . The M configurations Si are 
labelled by < i < M — 1 and the i— th configuration Si 
is assigned a number i\ to represent the reproduction rate or 
fitness of that configuration and another number pi to repre- 
sent the probability in that configurations. Such pi satisfy the 
normalization condition: X^o* Pi = 1- The coupled differ- 
ential equations satisfied by pi for the Crow-Kimura model JH] 
and Eigen model lUlQl are given in Appendix A and Appendix 
B, respectively. However, such coupled differential equations 
are valid only in the limit of infinite population size N, which 
is not the case in many real systems, e.g. virus in a given 
environment. Thus the study of finite population size prob- 
lem has attracted much attention in recent decades ll24ll - ll32ll . 
While the case of two alleles (types of genes) in the Wright- 
Fisher model (It] and the Moran model J2l can be analytically 
solved @], the realistic case of evolution with many sequences 
(genomes) stays intractable by traditional methods. In 12611 the 
additive fitness landscape has been considered, when the con- 
tributions of different alleles to the fitness are random num- 
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bers and in [12711 a finite population was considered in which 
the finesses of different sequences are independent random 
numbers. 

The purpose of the present Letter is to formulate Crow- 
Kimura model and Eigen model for finite population size and 
solve them for the single peak fitness landscape, popular in 
quasispecies literature. In such a landscape, the fitness of a 
configuration, say So, is larger than fitness of other configura- 
tions, i.e. ro > ri for i > 0, and all r; are equal for i > 1. We 
first formulate the iteration procedure for the finite population 
models, then derive HJE to describe the dynamic of the prob- 
ability distribution. The steady state solution of HJE gives 
the variance of the mean fitness. Our results are useful for 
understanding population sizes of virus in which the coupled 
differential equations can give reliable results for biological 
evolution problems. Our results are exact derivations, versus 
numerics or uncontrolled approximations in vast majority of 
finite population articles. 

Consider the case when the total number of different geno- 
types N g is either N g ~ 4 L , where L is the number of nu- 
cleotides in the virus genome, or N g ~ 2 L , where L is the 
number of two type of alleles or (spins), located in different 
places (loci). The infinite population case is when the popu- 
lation size is large enough to have large number of viruses 
of any type. The convergence of evolutionary dynamics with 
population size depends on the mutation rate and the fitness 
landscape. In the infinite population limit the evolution equa- 
tions are deterministic, and, for molecular evolution models 
lHt]-|H], there are many exact results I8tl- lfl9ll . It is possible 
even to find exact solutions for the steady state and dynamics 
lfl5l [l6l [Tel . In biology, the populations are often relatively 
small. Then the collective characteristics of the evolving pop- 
ulation will fluctuate. 



II. FINITE POPULATION CROW-KIMURA MODEL WITH 
SYMMETRIC FITNESS LANDSCAPE 

We consider the symmetric fitness landscape, popular in vi- 
rology. The genome is a collection of letters (spins) ±1, de- 
noting the alleles type. Thus a sequence is identified with a 
spin configuration of a one-dimensional Ising model. By mu- 
tation any letter may randomly change to the other value. 
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An important characteristics of the sequence space is the 
number of neighbors L and the number of sequences at the 
Hamming distance d (sequences have d alleles different from 
the reference sequence) Nd = grn^rgji ~ "§r f° r two loci 
case at d <C L. We consider a simple fitness landscape, pop- 
ular in population genetics, when the sequence from the l-th 
Hamming class has a fitness wi. 

As evolution is a stochastic process, we should work with 
probabilities. We are interested in steady state properties of 
the evolution model. We consider the finite population ver- 
sion of Crow-Kimura model, see the Appendix. In this case 
we consider the evolution as a Markov process, where the state 
(the state of the population is characterized by the number of 
individuals for the different possible numbers of mutations) 
is defined by L + 1 integers, the numbers of different Ham- 
ming types. During one evolution step, there are three pro- 
cesses: birth of new individuals proportional to the fitness of 
the corresponding Hamming class, transitions between Ham- 
ming classes proportional to I /L to the lower Hamming class 
and transitions proportional to (L — l)/L to the higher Ham- 
ming class JUI^]. These factors l/L and (L — l)/L have 
been derived in 19[ |20ll for the infinite population models, and 
should be applied to the discrete time scheme of the finite pop- 
ulation models as well. The iteration step is completed by the 
random reduction of the population to the initial size, N. This 
evolution dynamics described here corresponds to the Moran 
model with many alleles. Compared to the ordinary multi- 
allele Moran model |0], in our case there is a non-trivial ge- 
ometry in sequence space, defined by Hamming distance. 

We first define our model for the case of a general symmet- 
ric fitness landscape with Wrightian fitness fi = e Ur ' , in the 
l-th Hamming class, < I < L, where r; is the fitness defined 
earlier. Here the average number of mutations of genome per 
one replication period is U = 70 t, where 70 is a mutation 
rate per genome in the continuous-time parallel model, and t 
is the time step. At any moment the state of our model is char- 
acterized via L + 1 integers n.;. We choose 70 = 1, therefore 
U = t. We consider the U <C 1 limit. In this case the steady 
state results and dynamics are U independent (U gives just the 
scale). 

During the iteration step we consider the following pro- 
cesses: 

• A. Random growth with ni —> m + Sni. The Sni is a 
random binomial process with probability Ufi and n; 
trials. 

• B. Mutations. 

There are // forward mutations from the class /. 

We consider random integer number fi with binomial 
distribution with the probability parameter (1 — j/)U 
and ni trials. There are bi back mutations from the class 
I. 

We consider random integer number bi with binomial 
distribution with the probability parameter j-U and n; 
trials. Due to back mutations we have the following 
change of nr. m ->• Hi - h + 6j+i. 



• C. We randomly remove J^i & n i individuals from the 
population to keep a fixed population size. 



HI. SHARP PEAK (SINGLE PEAK) MODEL 

Consider the Wrightian fitness with tq = e e J for the peak 
sequence, U << 1 is the number of mutations per genera- 
tion, J is a fitness gap in the corresponding continuous-time 
parallel selection-mutation model, and 7^ = 1 for i > 1. Our 
goal is to investigate how the mean fitness depends on finite 
population size. 

In case of infinite population, one can calculate the num- 
ber of viruses with the peak sequence using a single equation, 
with the l/L accuracy. Assume that there is some proba- 
bility distribution p(n) for the number n of viruses with the 
peak sequence, which satisfies the normalization condition 
~52n=o P( n ) = Then we can derive both the steady state dis- 
tribution, which is a rather simple function, and even the exact 
dynamics, which is a complicated expression for the p(n). 

We consider a discrete time scheme of evolution with small 
U. During each iteration we consider the steps A, B, C. In 
Step A, there are Sn new viruses at the peak sequence. The 
number Sn is derived via a binomial n sampling with a small 
probability UJ. During the step C of reduction to keep a con- 
stant population size anyone of this Sn viruses could be re- 
moved from the system. The total number of removed viruses 
from the peak sequence m is calculated via binomial distribu- 
tion with Sn trials and the probability x = n/N . Therefore, 
the result of A and C steps should be a sampling of n parti- 
cles with a probability U J(l — x). Thus after steps A and 
C the original n changes as n —> 11 + h, where h has a bino- 
mial distribution with a probability parameter^ = UJ(1 — x), 
and the number of trials is n. During the step B of mutation, 
n —¥ n — m, where m has a binomial distribution with a prob- 
ability parameter U and the number of trials is n. Thus after 
one iteration 77 — > 11 + h — m. 

If we have a distribution p(t, n) at the t-th moment of time, 
then after an iteration with the period of time U we have a 
distribution: 

p(t + U, n) =< p(t, n — h + m) > (1) 

when the averaging is over the (binomial) distributions of h 
and m. 

Let us assume the following anzats for the probability dis- 
tribution at the time t: 

p(t, n) = cxp[N(j)(t, x)], x = n/N. (2) 

After an iteration 

e Nt(t+U,x) = J dte N<P(x) < e -{h- m W{t,x) > (3) 

where cj)' = . As we used binomial probability dis- 

tributions in the iteration step, we should perform an average 
via the binomial distribution in Eq. (3). We use the following 
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h\(M - h)\ 



formula of the binomial distribution of the h with a success 
probability p and M trials: 

M 

<e kk > ^j2 ehk p h ( i -py 

h=0 

w^'- 1 ). (4) 

We consider the case of small p<l. 

Taking k = ~4>', M = Nx and p = UJ(1 - x) in Eq.(4) 
(see the definition of iteration steps A,C) we find 

< e"^' >= [(1 - x)UJe^' + 1 - UJ(1 - x)] Nx 

^ NUJx(l-x)(e~*' -1) 

In the same way we consider the mutation, taking k 

(j)',p= U, M — xN we derive 



(5) 



< e*'" 1 >= [Ue*' + 1 - U] xN = cxp[NUx(e' t '' - 1)]. (6) 

Combining Eqs.(5),(6) and holding only the linear terms in U, 
we obtain the following expression 

4>{t + U,x) = <p{t,x) + UxJ{l-x){e-'l' 1 -l) + Ux(e' t '' -1) 

(7) 

or 

d<j>(t, x) 



at 



xJ{l-x){e- 1 '' -l)+x(e? -1). (8) 



In the steady state we just have an ordinary differential 
equation for <fi. We derive the following nontrivial solution 
<j>o(x) = 4>(oo, x) and the corresponding distribution 



<po(x) = / d£c1n.J(l—x) = (x — xo)hiJ + 
(1 - x){\ - ln(l - x)) - (1 - x )(l - ln(l - xq)), 



p{x) 



NJ 



cxp[AT0(x)], (9) 



where we added the pre-factor y to ensure the condition 
that total probability is 1. Here the distribution has a maxi- 
mum at x = xq = (1 — 1/J), see ifioll . and 4>(xq)" = —J. 
Then we derive for the variance: 

V = (< x 2 > - < x > 2 )N = -. (10) 

Thus we derived the whole steady state distribution via Eqs. 
(Illi,®, and the expression for the variance Eq.dTob. 

Equation dTOb is verified numerically in Fig. 1 for J = 
1.5, 2, 3, 4. One could follow the method used in IU8I1 to 
solve Eq. ([8j and get time evolution of <f){t, x). 



At any discrete moment of time we consider three pro- 
cesses: 

A. the number of viruses in the class I grows with a probabil- 
ity Uri There are mutations. New viruses mutate with a finite 
mutation probability 1 — Q, 

C. There is a dilution of the whole population, keeping strictly 
the total population size as N. 

Consider again the single peak fitness, tq = A, and for 
I > 0, r; = 1. n is the number of the viruses with the peak 
sequence, and x = n/N . 

Let us give the details of the processes A and B. 

Al. Reproduction in the peak sequence So'. We randomly 
choose I elements from a pool of n elements and each ele- 
ment is chosen independently with a probability U A. Thus 
the probability to get I elements is 



MO 



ll(n-l)\ 



(UA) l (l - UA) 



n—l 



(ID 



l is the number of new sequences at the peak sequence. 
A2. Reproduction in the other sequences, i.e. Si fori > 0: We 
randomly choose k elements from a pool of (N — n) elements 
and each element is chosen independently with the probability 
U. Thus the probability distribution to get k elements is 



M*0 



(N n )l _l/k^ _ jj-^N-n-k 



kl(N-n-k)l 



(12) 



After Al and A2 steps there are n + I viruses at the peak 
sequences and N — n + k sequences at other sequences. 

B. We randomly choose m elements from a pool of I elements 
in So and each element is chosen independently with the prob- 
ability Q = exp[— 7] to be in So- Thus the probability to get 
m elements in So is: 



Mm) 



/! 



m\{l — m)\ 



Q m (i-Q) 



l—m 



(13) 



After the step B, there are n + m viruses in the peak sequence 
So and N — n + k + (I — m) viruses in other sequences. Thus 
there are N + k + I sequences in S; for 1 < i. In the next 
step, we will uniformly remove I + k sequences so that the 
total population is still N. 

C. We randomly choose h elements from a pool of I + k el- 
ements in So and each element is chosen independently with 
the probability x. Thus the probability to remove h elements 
from So is: 



IV. FINITE POPULATION VERSION OF THE EIGEN 
MODEL 



MM 



x h (i - x f l+k -v 
h\(i + k~h)\ 



(14) 



Consider now the finite population version of the Eigen 
model with zero degradation. There are n viruses at the peak 
sequence. 



Besides, we remove l + k — h elements from Si for i > 0. We 
have that during one iteration step n — > n + m — h, therefore 
we need to find the average < ( m ~ h ) > via the distribu- 
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tions px(l)p2(k)p3(m)pi(h). We consider: 

E 



\{UA) l {\ - UA) n - 1 l\Q m (l - Q) l - m e-' t '' m 



l,k,m,h 



l\(n-l)\ 



m\(l — m)\ 



(N - n)\U k {l - u)N-n-k (j + k y x h^ _ ^(i+k-h) ^ 

k\(N-n-k)\ h\(l + k-h)\ " 

First we transform 



\15) 



l1^) m (^ — ny- m P -<P' m 

E u, v = {Qe-*' + l-Q) 1 . (16) 

m!(/ — ml! 

m v ' 

Using the transformation 



we obtain 

n!(£M)'(l - (TV - n)![/ fe (l - U) N ~ n - k 



E 



Z!(n-Z)! 



fel(iV-n-fc)! 



(Qe - *' + 1 - - a; + xe^ ) i+fe (.18) 
The sum over I, k gives an equation 



(19) 



where 



F(^) = xA[(Qe-*' + l-Q)(xe*' + l-a?)-l] 

+ x(l -x)(e ' - 1). (20) 

We need to consider the first two terms in the <f>' expansion 

F{4>') Ps -x[(QA - 1) - (A - l)x](f>' 

6 12 

+x[QA(l-2x) + (A-l)x + l]^-. (21) 



In the steady state we consider F (</>') = 0. We expand Eq. (21) 
in powers of y = x — t^Ew an d nn d tne following steady 
state solution: 

(A-l)y 



<t> = -2 
Therefore, 



Q{l-Q)^T ) -^QA+l-A)y 



'(0) 



(A -I) 2 



o(i -g)A 2 ' 

and eventually we obtain for the variance V of distribution 



(23) 



V = N < y 2 >= N(< pi > - < po > 2 ) 



Q(l - Q)A- 
(A — l) 2 



(24) 



In Appendix C, we derive the steady state distribution and the 
variance for the Eigen model with degradation Eq. (C6). 
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FIG. 1: Verification of the Equation d 1 0b for variance: V = 1/J. 
The horizontal lines from top from bottom are analytic results for 
J — 1.5, 2, 3 and 4, respectively. Circles, squares, diamonds, and 
triangles represent numerical data for J = 1.5, 2, 3 and 4, respec- 
tively. Analytic and numerical results are quite consistent with each 
other. 



V. DISCUSSION 

The investigation of finite population problem is the hard- 
est mathematical problem in evolution theory. In this article 
we solved exactly some aspects of finite population version of 
Crow-Kimura and Eigen model with degradation. We calcu- 
lated the variance of the distribution for the mean fitness in 
the equlibrium. Our equation could be applied to calculate the 
dynamics of the distribution as well. 

The quasispecies model, especially the one with single peak 
fitness and its simple generalizations, has a lot of applications 
in the virus evolution ll33ll . cancer modeling l34ll and molecu- 
lar evolution l35ll . Therefore any rigorous results here should 
be welcomed. 

In this article we considered just one aspect of convergence 
of finite population result to the results in infinite population 
considering the variance of the mean fitness. According this 
criteria, N ~ L 2 is large enough to have the same mean fit- 
ness as the infinite population with the accuracy 1/L. Ac- 
tually an important open problem is to investigate the equi- 
librium here (mutation-selection), like the equilibrium in the 
thermodynamics, and how the equilibrium is affected by finite 
size of population. 
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VIII. APPENDIX C. EIGEN MODEL WITH 
DEGRADATION 



VI. APPENDIX A. CROW-KIMURA MODEL 

We here consider the infinite population model. The M = 
2 L genome configurations (sequences) are defined as chains 
of L spins Sk, 1 < k < L having values ±1. There is a 
reference sequence 5 with all spins +1. We define the Ham- 
ming distance between the given sequence and the reference 
sequence by — Sfe)/2 = N(l — m)/2, where to is an 

overlap. 

The state of system is specified by the M relative frequen- 
cies pi, < i < M — 1: 



dt 



A 



8 u r, + m-i 



V ~ l J 3 



(A.l) 



Here my is the rate of mutation from the sequence j to the 
sequence i,and r t is the fitness. Two sequences have a Ham- 



ming distance dij = (L — J2k s i s j)/^- Here to 



When d^ = 1, mi 



70 /L, mi 



for da > 1 



interested in the symmetric fitness landscape with 

n = f(i - 2d i0 /L). 



= -Jo- 
We are 



(A.2) 



We choose the first L+l sequences such that the Z-th sequence 
has Hamming distance I from the reference sequence, < / < 
L. Then our r; are connected with the fi in the main text as 



where 



h = exp[nf/]. 



U = T70, 



(A.3) 



(A.4) 



where r is the iteration duration and / in Eq.(A3) is the Ham- 
ming class of the i-th sequence. In the main text we consider 
discrete time evolution with minimal time interval r = U, 
choosing 7 = 1. 



VD. APPENDIX B. EIGEN MODEL 

In Eigen's quasispecies theory JH 01, the i-th sequence 
produces offspring of the type j with the probability Qij = 
qL-dij ^ _ q^dtj f where 1 — q is the average number of errors 
per site and dij is the Hamming distance . 

Eigen proposed that pi satisfy 



dp 



= {Qiir i -D i Y,hPk{t)}Pi{t) 



y^^QikrkPkjt). 



(B.l) 



Here Di describes degradation. It is convenient to work with 
the error rate 7 = L(l — q), leading to Q = e -7 . 



We now consider the Eigen model when there is degrada- 
tion D in the peak sequence So, and zero degradation for other 
sequences Si for i > 0. In this case we should add the random 
sampling for the degradation case. The calculation procedure 
is similar to the case in the section of Eigen model. We should 
just modify the iteration sub-steps from that section, after the 
point B. 

C. There is a dilution of the population with the peak se- 
quence. We randomly choose t elements from a pool of n 
elements and each element is chosen independently with the 
probability U D. 

D. There is a dilution of the whole population, keeping 
strictly the total population size as N. 

We randomly choose h elements from a pool of I + k — t 
elements and each element is chosen independently with the 
probability {7(1 — x). 

Now after one iteration step n — > n + m — h — t. Thus 
we should calculate < ( m ~*-' 1 ) >. We get the following 
expression: 



< e -{™-t-h)<t>' ~ > _ 



E : 

l.k.m.t.h 



i\{UA) l {l - UA) n - 1 l\Q m (l - Q) l - m e-^" 



l\(n-l)\ 



m\(l — to)! 



n\e'*'' t e (t, ' h \UD) t (l - UD)"-* 
t\(n-t)\ 

(N - n)\U k {l - U) N - n - k (l + k- t)\x k (l - x )(i+k-t-h) 



k\{N-n-k)\ h\{l + k-t-h)\ 

We first perform the sum over h: 

^{l + k- t)\x h (l - x )(l+k-t-h) ^ 

^ h\{l + k-t-h)\ 6 

= (l + x(e*' -l))' +fc -*, (C.2) 
then perform the sum over t: 



(€.1) 



^ n!e^'* (UDY (1 - UD) n ~ t 



t\(n-t)\ 



+ x{e* -1))-* = 



Nx 



(l + x(e^' - 1)) 



exp[Ux( 



o4>' 



1 + x(e<f - 1 



+ 1 — 17) 



- l)rf7V](C3) 



Comparing our formulas with the expression of F ((/>') from 
the section of Finite population Eigen model, we find just new 
additional term to those of Eq.(20). Eventually we have: 



= xA\{Qe-V + 1 - Q)(are*' + 1 - x) - 1] 



of 



1 + xie'P - 1 



1)D + (1 - x)x{e' t ' -1). (C.4) 
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We expand in powers of <f>': and obtain for the variance V 

W) « -KQ^ 4 - 1 - -D) - (A - 1 - D)a#' + 

0' 2 

-V[QA(1 - 2a;) + W - l)x + 1 + D(l - x)(l - 2xJlC.5) 

2 T/ A(l - Q)(Q4 - 1)AQ + 2AQd - d- d 2 ) 

= 7~j ; .(C.O) 

Putting the value of x = ^E§=y > we derive [A — 1 — a) 

F{4>') « -[(Qi4 -1-D)-(A-1- d)a#' 

j./2 /o c i i r^Min i n2 i c i i i o n^>\\\ For D = 0, Equation JC.61 > reduces to Equation d20li for the 

0" (2a(-l + (5)((D + D z + (-1 + a)a<3 - 2aD<9))) ' 1 " ' 4 

— . Eigen model without degradation. 



(l + Z?-a) 2 
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